1 Introduction

Covid-19 has managed to put this world to a complete stop and continues to be a thorn in our daily lives. Every day new variants and corresponding mandates for damage control result in low rates of cases and deaths. According to Stephanie Soucheray from CIDRAP, the covid cases have dropped by 40% (Stephanie Soucheray, 2022) in the last week alone from the first week of February to the second week of February. This reduction has led to the relaxation of the mask mandates and other mandates by various states across the US.

According to Viviene walt, the introduction of covid passports being compulsory in a few countries has increased the overall vaccination rate in countries like France, Italy, Germany and Denmark. (Viviene walt, Fortune, 2021) Just the introduction of this rule led to increased vaccination rates 20 days before the anticipated implementation, with a lasting effect for upto 40 days.

It is interesting to check for the effect of these mandates on the overall case fatality rate (CFR) whether they truly have been effective in slowing down the extent of the spread of the virus. Various countries employed policies on a varying timeline, for this study we look at the top 10 most populous countries (Anon, Worldometer, 2022.)ie. United States of America, India, China, Indonesia, Pakistan, Nigeria, Brazil, Bangladesh, Russia, and Mexico. The main question we wish to answer is to look at how the closing of workplaces, banning international travel, and financial compensations provided by the governments like debt reliefs affected reducing the CFR.

The analysis occurs only pre and post implication of the mandates, we do not look at the various phases of the same. We look at the effect of the introduction of various mandates and their immediate effect on the CFR.

This report is organized as follows: We first look at all the datasets available and select only the top 10 countries based on population. After necessary preprocessing, we check for any kind of transformation required for the Case Fatality Rate. We check whether all conditions required for fitting the panel regression model (Wooldridge, J.M., 2011) are met by conducting various statistical tests. Next, we look at the model diagnostics to get insights regarding the effectiveness of the particular mandate in the study.

2 Dataset

To answer the above questions, we needed data sets able to describe not only the deaths and the cases but also establish the timeline for all the mandates being implemented and combine them together for the most effective result for the analysis conducted in the later parts of the report. We use data from two main sources for this study:

2.1 WHO Covid Data (WHO Data, 2021)

The data obtained is gathered from the WHO Health Emergency Dashboard . It is an open-source dataset consisting of data pertaining to the number of cases and deaths that have occured over the past 2 years and arranged in a time-series capacity. The table below shows all the important variables important in the dataset.

Variable Name Variable Data type Variable Description
Date Reported Timestamp Timestamp of data collection
Country Code Factor Abbreviation of the Country name
Country Factor Country name
WHO Region Factor Identifies various WHO regions
New Cases Quantitative Number of new cases
Cumulative Cases Quantitative Sum of cases till the given date
New Deaths Quantitative Number of new deaths
Cumulative Deaths Quantitative Sum of deaths till the given date

2.2 OxCGRT Dataset (Oxford Data, 2022)

The Oxford Covid-19 Government Response Tracker (OxCGRT) collects systematic information on policy measures that governments have taken to tackle COVID-19. The different policy responses are tracked since 1 January 2020, cover more than 180 countries and are coded into 23 indicators, such as school closures, travel restrictions, vaccination policy. These policies are recorded on a scale to reflect the extent of government action, and scores are aggregated into a suite of policy indices. The data can help decision-makers and citizens understand governmental responses in a consistent way, aiding efforts to fight the pandemic. Few of such mandates that could be interesting to look at are summarized in the table below.

Variable Name Variable Data type Variable Description
Date Timestamp Timestamp of data collection
C1 Factor Indicator for School Closures
C2 Factor Indicator for Workplace Closures
C6 Factor Indicator for Stay at Home order
C8 Factor Indicator for Internation Travel Restrictions
E1 Factor Indicator for Income Support
E2 Factor Indicator for Debt/Contract Relief
H3 Factor Indicator for Contact Tracing
H7 Factor Indicator for Vaccination Policy
V2A Factor Indicator for Vaccination Availability

3 Data Preprocessing

After loading in the datasets, few of the discrepancies noticed were the missing values in the datasets. In order to handle those, instead of dropping the data points, they were replaced by the average or the most occuring element depending on whether the element was a numerical or a categorical(factor) variable. Next, it was important to standardize the factor labels for the countries to ensure consistency across datasets. For ex: the WHO covid dataset identifies United States as “United States”, while the OxCGRT dataset identifies it as “United States of America”. Next to maintain consistency across all countries, the dataset is subseted from the start till 31st Decemeber, 2021. Lastly, we create the response variable since its not readily available. The CFR is defined as,

\[ \text{Case Fatality Rate (CFR)} = \frac{\text{New Deaths}}{\text{New Cases}} \]

4 Exploratory Data Analysis

The purpose of performing EDA on the datasets is to gain insights into the dataset and confirm an intuition that we can observe based on the trends we plot. A series of plots are generated to confirm our general hypothesis that the mandates have an reducing effect on the CFR. In order to visualize this we create a series of boxplots to see if the average CFR before and after introduction of the mandate. The figure below shows the boxplots of all the mandates being studied in the following report for the United States.

We can clearly see that there is an reduction in the CFR in case of the international travel restrictions and debt/contract relief. There seems to be no significant difference in the CFR before and after the availability of the vaccine. Interestingly, the CFR surprisingly increases before and after the introduction of the income support. This can be attributed to people trying to stock up on resources due to the financial support provided by the Federal government.

We can look at also the various CFR trends that are generated based on all the countries being studied in this project. The figure below shows the trends using a line plot with respect to the timestamp.

In this trend, China seems to have the highest CFR in 2020 as opposed to all countries, while other countries seem to have a low CFR very close to 0. Mexico seems to have the most erratic trend for CFR with the fatality being very high from June 2020 as opposed to other countries being studied. The reason for the dip in the high CFR for China could be the very strict lockdown that was implemented during the first wave of this pandemic.

We can also look at how the number of cases with time and its contrast with the above CFR plot. That would help us identify whether the deaths were actually reducing over time or is CFR not an appropriate response variable for this study.

For all countries, we can very evidently observe that the cases have increased significantly however by the low CFR rates especially in the time range of June 2021 to present, we can see a spike in the number of cases, with United States having the maximum cases can be attributed to the Omicron variant. This variant had a very low fatality rate but had a high transmission rate leading to high number of cases but lesser amount of deaths.

Lastly, we can look at what the pandemic has had an effect on the entirety of the world. The map below shows the cumulative deaths till date. We can see that USA has had the maximum deaths till date followed by Brazil, India and Russia. Surprisingly, China seems to have lower deaths than other countries.

5 Statistical Modelling

One of the approaches to use to solve this kind of problem would be to divide the dataset into various covid waves and analyze them separately as a regression model. In econometrics this techqnique is known as a panel regression. In a panel regression setting, we divide timeseries data into acceptable samples and perform linear regression on those specific samples. The reason of choosing this method will help us analyze the CFR rate before and after implementation of a particular mandate by the Federal Governments.

Panel regression is a modeling method adapted to panel data, also called longitudinal data or cross-sectional data. It is widely used in econometrics, where the behavior of statistical units (i.e. panel units) is followed across time. Those units can be firms, countries, states, etc. Panel regression allows controlling both for panel unit effect and for time effect when estimating regression coefficients.
Panel data regression is a powerful way to control dependencies of unobserved, independent variables on a dependent variable, which can lead to biased estimators in traditional linear regression models

In general, a Panel regression equation is given by,

\[Y_{i,t} = X_{i,t}\beta + \alpha_i + \epsilon_{i,t}\]

where,

For this model, there are a few assumptions that need to be met to make sure that the inferences drawn from the model are interprettable. Following are the assumptions:

From the above plots, we can be sure of the linearity of relationship between the indicators and the CFR variables being studied. In order to test for other assumptions we run various tests in the following section (ref:Sensitivity Analysis). Now to allow interpretablity, we test the average CFR arranged weekly aggregated in such a way that all countries have similar amount of data. The proposed model for this study is given by,

\[ Y_{avg_{CFR},i,week} = \beta_0 X_{vaccineAvailability} + \beta_1 X_{incomeSupport} + \beta_2 X_{debtRelief} + \beta_3 X_{internationalTravel} + \epsilon_{i,week} \]

where,

The above model is fit using the plm() function in R. This helps in setting up panel regression models based on a categoricalindex and a time index. For this report we are setting the index as the country being studied and the week that is being studied. The summary of the model described above is shown in the table below.

Dependent variable:
avg_CFR
vaccine_availability2 -0.010**
(0.004)
income_support2 -0.006
(0.006)
debt_relief2 -0.015***
(0.006)
international_travel2 0.042***
(0.009)
Observations 985
R2 0.025
Adjusted R2 0.013
F Statistic 6.337*** (df = 4; 972)
Note: p<0.1; p<0.05; p<0.01

One of the things that stand out is that international travel ban was not as effective in reducing the CFR but in fact results in increasing the CFR which is something thats not consistent with the boxplots we obtain in the EDA section above. Income support almost has no effect in reducing or increasing the CFR. We get the expected negative coefficients in case of vaccine availability and debt relief. Debt relief seems to have the most effect in the reduction of the CFR as opposed to vaccine availability. By introduction of the mandate, the overall effect across all countries is a reduction of 0.044. Next, with the vaccine availability has had an effect of reducing the CFR by 0.019. These coefficients may seem small but its important to see the basic values of CFR variations. From the summary statistics, the CFR values mainly lie in the lower ranges with values being in the range \([0,0.14]\).

In case of Causal inference, it is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning. Since the data being studied is because the WHO COVID-19 dataset is a time series, which means it is observational data and not randomly sampled data. Therefore, the results and conclusions will not be as a result of causal inference, but rather general associations.

6 Sensitivity Analysis

Now, its important to see all the assumptions mentioned above are met by the model. The assumptions are:

We can see that the residuals are slightly right skewed. We can also observe outliers in the dataset however they dont seem influential. Overall the reiduals seem to follow normal distribution which has a slight right skew.

Next to check for serial correlation we use the Bruesch-Godfrey/Woolridge (Woolridge, 2014) test. It is used to assess the validity of some of the modelling assumptions inherent in applying regression-like models to observed data series. In particular, it tests for the presence of serial correlation that has not been included in a proposed model structure and which, if present, would mean that incorrect conclusions would be drawn from other tests or that sub-optimal estimates of model parameters would be obtained. We also run a White’s Test (Black, John and Hashimzade, Nigar and Myles, Gareth D., 2017) to check for constant variances.

For the Bruesch-Godfrey test, Following is the setup for the testing: \[ H_0 : \text{There is no auto correlation at any order} \] \[ H_a : \text{There exists some auto correlation at any order} \]

On using the pbgtest() in the plm package, we can run the Breusch-Godfrey test to check for the autocorrelation. On doing so, we get a p-value of \(\leq 2.2e^{-16}\). Since we fail to reject the Null Hypothesis \(H_0\), we know that there is no autocorrelation at any order in the variables.

Next we check for the unequal variances, using the White’s test. The test is set up as following:

\[ H_0 : \text{The variances for the errors are equal. } \]

\[ H_a : \text{The variances for the errors are unequal. } \]

On using the bptest() in the lmtest package, we can run the White’s test to check for the homoskedasticity. On doing so, we get a p-value of $ 0.0002$. Since we fail to reject the Null Hypothesis \(H_0\), we know that the variances are equal.

7 Conclusion & Future Scope

Based on the panel regression model we fit, we understand the effectiveness of few of the mandates being studied. To summarise the findings, we can conclude that restriction of international travel provided an unexpected result wherein the restrictions led to an increase in the case fatality rate. Income support does not seem to have any significant effect on the CFR. Noticeably, we get vaccine availability and debt relief seem to have a significant reduction in the CFR. This can be interpreted as vaccine availability led to people getting vaccinated before hand thus reducing the fatality. Similarly, debt relief could have encouraged people to work from home thus reducing the transmission rate. All the model assumptions seem to be met effectively. However the results obtained from the model cannot be correlated with the actual data that was observed from the visualizations. There are variables that are not considered which cause the results of the model to be inaccurate. In the next phase, one of the things to focus on is to identify the variables that can help in understanding the effects of the mandates better. One of such variables could be the continent of the countries as that would help in grouping similar countries together. We can also try to fit ANOVA model in order to see the effect of these mandates better and get more insights into these variables.

8 References

Stephanie Soucheray | News Reporter | CIDRAP News  | Feb 14, 2022, 2022. Covid-19 cases drop by 40% in US. CIDRAP. Available at: https://www.cidrap.umn.edu/news-perspective/2022/02/covid-19-cases-drop-40-us [Accessed March 10, 2022].

Walt, V., 2021. Covid-19 vaccine mandates and passports work to increase jab rates-sometimes spectacularly. Fortune. Available at: https://fortune.com/2021/12/14/covid-19-vaccine-mandates-passports-increase-jab-rates-lancet-study-finds/ [Accessed March 3, 2022].

Anon, Countries in the world by population (2022). Worldometer. Available at: https://www.worldometers.info/world-population/population-by-country/ [Accessed March 10, 2022].

Wooldridge, J.M., 2011. Econometric analysis of cross section and panel data, Cambridge, MA: MIT.

WHO Data, Who coronavirus (covid-19) dashboard . Available at: https://covid19.who.int/WHO-COVID-19-global-table-data.csv [Accessed March 4, 2022].

OxCGRT, OXCGRT/Covid-policy-tracker: Systematic dataset of covid-19 policy, from Oxford University. GitHub. Available at: https://github.com/OxCGRT/covid-policy-tracker [Accessed March 10, 2022].

Born, B. & Breitung, J., 2014. Testing for serial correlation in fixed-effects panel data models. Econometric Reviews, 35(7), pp.1290–1316.

Black, J., Hashimzade, N. & Myles, G.D., 2017. A dictionary of economics, Oxford: Oxford University Press.

9 Acknowledgement

Help taken from members of team 5 and incorporated suggested improvements from the draft in the draft discussion session.

10 Session Information

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252   
## [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.1252    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] stargazer_5.2.2  plotly_4.10.0    gridExtra_2.3    sjPlot_2.8.10   
##  [5] Cairo_1.5-14     plm_2.6-0        car_3.0-11       carData_3.0-4   
##  [9] lmtest_0.9-39    zoo_1.8-9        foreign_0.8-81   lubridate_1.7.10
## [13] kableExtra_1.3.4 Hmisc_4.6-0      Formula_1.2-4    survival_3.2-11 
## [17] lattice_0.20-44  knitr_1.36       forcats_0.5.1    stringr_1.4.0   
## [21] dplyr_1.0.7      purrr_0.3.4      readr_2.0.1      tidyr_1.1.3     
## [25] tibble_3.1.4     ggplot2_3.3.5    tidyverse_1.3.1 
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1        backports_1.2.1     miscTools_0.6-26   
##   [4] systemfonts_1.0.3   lazyeval_0.2.2      splines_4.1.1      
##   [7] crosstalk_1.1.1     TH.data_1.1-0       digest_0.6.27      
##  [10] htmltools_0.5.2     fansi_0.5.0         magrittr_2.0.1     
##  [13] checkmate_2.0.0     cluster_2.1.2       tzdb_0.1.2         
##  [16] openxlsx_4.2.4      modelr_0.1.8        vroom_1.5.5        
##  [19] sandwich_3.0-1      svglite_2.0.0       bdsmatrix_1.3-4    
##  [22] jpeg_0.1-9          colorspace_2.0-2    rvest_1.0.1        
##  [25] haven_2.4.3         rbibutils_2.2.4     xfun_0.26          
##  [28] crayon_1.4.1        jsonlite_1.7.2      lme4_1.1-27.1      
##  [31] glue_1.4.2          gtable_0.3.0        emmeans_1.7.2      
##  [34] webshot_0.5.2       sjstats_0.18.1      sjmisc_2.8.9       
##  [37] maxLik_1.5-2        abind_1.4-5         scales_1.1.1       
##  [40] mvtnorm_1.1-3       DBI_1.1.1           ggeffects_1.1.1    
##  [43] Rcpp_1.0.7          viridisLite_0.4.0   xtable_1.8-4       
##  [46] performance_0.8.0   htmlTable_2.3.0     bit_4.0.4          
##  [49] collapse_1.7.6      datawizard_0.3.0    htmlwidgets_1.5.4  
##  [52] httr_1.4.2          RColorBrewer_1.1-2  ellipsis_0.3.2     
##  [55] pkgconfig_2.0.3     farver_2.1.0        nnet_7.3-16        
##  [58] sass_0.4.0          dbplyr_2.1.1        utf8_1.2.2         
##  [61] tidyselect_1.1.1    labeling_0.4.2      rlang_0.4.11       
##  [64] effectsize_0.6.0.1  munsell_0.5.0       cellranger_1.1.0   
##  [67] tools_4.1.1         cli_3.0.1           generics_0.1.0     
##  [70] sjlabelled_1.1.8    broom_0.7.12        evaluate_0.14      
##  [73] fastmap_1.1.0       yaml_2.2.1          bit64_4.0.5        
##  [76] fs_1.5.0            zip_2.2.0           nlme_3.1-152       
##  [79] xml2_1.3.2          compiler_4.1.1      rstudioapi_0.13    
##  [82] curl_4.3.2          png_0.1-7           reprex_2.0.1       
##  [85] bslib_0.3.1         stringi_1.7.4       highr_0.9          
##  [88] parameters_0.16.0   Matrix_1.3-4        nloptr_1.2.2.2     
##  [91] vctrs_0.3.8         pillar_1.6.2        lifecycle_1.0.0    
##  [94] Rdpack_2.1.2        jquerylib_0.1.4     estimability_1.3   
##  [97] data.table_1.14.0   insight_0.16.0      R6_2.5.1           
## [100] latticeExtra_0.6-29 rio_0.5.27          codetools_0.2-18   
## [103] boot_1.3-28         MASS_7.3-54         assertthat_0.2.1   
## [106] withr_2.4.3         multcomp_1.4-17     bayestestR_0.11.5  
## [109] parallel_4.1.1      hms_1.1.0           rpart_4.1-15       
## [112] minqa_1.2.4         rmarkdown_2.11      base64enc_0.1-3